Runtime Data Flow Scheduling of Matrix Computations

نویسنده

  • Ernie Chan
چکیده

We investigate the scheduling of matrix computations expressed as directed acyclic graphs for shared-memory parallelism. Because of the data granularity in this problem domain, even slight variations in load balance or data locality can greatly affect performance. Well-known scheduling algorithms such as work stealing have proven time and space bounds, but these bounds do not provide a discernable indicator of performance between different scheduling algorithms and heuristics. We provide a flexible framework for scheduling matrix computations, which we use to empirically quantify different scheduling algorithms. By building software solutions based on hardware techniques through leveraging a cache coherence protocol, we develop a scheduling algorithm that addresses both load balance and data locality simultaneously and show its performance benefits.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Runtime Data Flow Graph Scheduling of Matrix Computations with Multiple Hardware Accelerators

Abstract In our previous work, we have presented a systematic methodology for parallelizing dense matrix computations using a separation of concerns between the code that implements a linear algebra algorithm and a runtime system that exploits parallelism for which only relatively simple scheduling algorithms were used to parallelize a wide range of dense matrix computations. We have extended t...

متن کامل

Scheduling algorithms-by-blocks on small clusters

The arrival of multicore architectures has generated an interest in reformulating dense matrix computations as algorithms-by-blocks, where submatrices are units of data and computations with those blocks are units of computation. Rather than directly executing such an algorithm, a directed acyclic graph (DAG) is generated at runtime that is then scheduled by a runtime system like SuperMatrix. T...

متن کامل

An Overview of the RAPID Run-time System for Parallel Irregular Computations

RAPID is a run-time system that uses an inspector/executor approach to parallelize irregular computations by embodying graph scheduling techniques to optimize interleaved communication and computation with mixed granularities. It provides a set of library functions for specifying irregular data objects and tasks that access these objects, extracts a task dependence graph from data access patter...

متن کامل

The Foundations of Thread-level Parallelism in the SuperMatrix Runtime System∗

In this paper, we describe the interface and implementation of the SuperMatrix runtime system. SuperMatrix exploits parallelism from matrix computations by mapping a linear algebra algorithm to a directed acyclic graph (DAG). We give detailed descriptions of how to dynamically construct a DAG where tasks consisting of matrix operations represent the nodes and data dependencies between tasks rep...

متن کامل

Scaling Up Matrix Computations on Shared - Memory

While the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the parallel efficiency of the deployed parallel software starts to decrease. This unscalability problem happens to both vendorprovided and open-source software and wastes CPU cycles and energy. By expecting CPUs with hundreds of cores to be imminent, we have designed a new framewo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009